How to {ggplot2}

Carlina Feldmann
Lennart Oelschläger

<<<<<<< HEAD

Last rendered on 14.09.2022

=======

Last rendered on 13.09.2022

>>>>>>> 0d887b63a19361b8869a370bf1dea2461c725816

Why and what

Welcome to this tiny course on data visualization in R with {ggplot2}! 👋

Why do we care?

Potentially, plots can beautifully inform or horribly mislead. Colors and shape matter! ⚖️

Why {ggplot2}?

The {ggplot2} package implements a grammar of graphics, a series of distinct tasks to make a graphic.

What is this course about?

Being in decent control of {ggplot2} to produce meaningful plots.

What do you need?

Basic R skills + a not-too-old version of R (>= 4.0.0) + RStudio

At the end of the day…

Sources

Found mistakes? Have suggestions?

I’m sure you have! Please leave a note here. 🙏

Our first plot

Load {ggplot2}.

# install.packages(ggplot2)
library(ggplot2)

We need data, let’s go with an excerpt from the famous Gapminder dataset:

<<<<<<< HEAD
# install.packages(gapminder)
library(gapminder)
head(gapminder)
=======
# install.packages(gapminder)
library(gapminder)
head(gapminder)
>>>>>>> 0d887b63a19361b8869a370bf1dea2461c725816
## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.
<<<<<<< HEAD
str(gapminder)
=======
str(gapminder)
>>>>>>> 0d887b63a19361b8869a370bf1dea2461c725816
## tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num [1:1704] 28.8 30.3 32 34 36.1 ...
##  $ pop      : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
<<<<<<< HEAD

First, we tell the ggplot() function what data we use and what variables we wish to see on each axis:

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) 

Something is missing … 🤔 We need an additional layer, a geom_* function!

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point()

There are more of them which we can simply add (literally add!):

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p <- p + geom_point() + geom_smooth()
p

As a last polishing step for now, we improve the x-axis scale and the plot labels.

p + scale_x_log10(labels = scales::dollar) +
  labs(x = "GDP per capita",
       y = "Life expectancy in years",
       title = "Economic growth as an indicator for life expectancy",
       subtitle = "Data points are country-years",
       caption = "Source: Gapminder")

Finally, we can use the ggsave() function to save our plot:

ggsave("some_descriptive_name.pdf")
=======

First, we tell the ggplot() function what data we use and what variables we wish to see on each axis:

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) 

Something is missing … 🤔 We need an additional layer, a geom_* function!

ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
  geom_point()

There are more of them which we can simply add (literally add!):

p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p <- p + geom_point() + geom_smooth()
p

As a last polishing step for now, we improve the x-axis scale and the plot labels.

p + scale_x_log10(labels = scales::dollar) +
  labs(x = "GDP per capita",
       y = "Life expectancy in years",
       title = "Economic growth as an indicator for life expectancy",
       subtitle = "Data points are country-years",
       caption = "Source: Gapminder")

Finally, we can use the ggsave() function to save our plot:

ggsave("some_descriptive_name.pdf")
>>>>>>> 0d887b63a19361b8869a370bf1dea2461c725816

The {ggplot2} workflow

  1. Call ggplot()
  2. Set data = ...
  3. Set mapping = aes(...)
  4. Add one (or more) geom_*() functions
  5. Adjust the scale and labels

It’s your turn

This course includes tutorials! 😎

<<<<<<< HEAD

Executing the following lines gives you access to the course material:

# install.packages("devtools")
devtools::install_github("loelschlaeger/howtoggplot2")
library(howtoggplot2)

To start the tutorial, type:

practicals()

To open a copy of these slides, type:

slides()

To submit an issue on GitHub about this course, type:

issue()

Facets and more geoms

Our goal is to plot the trajectory of life expectancy over time for each country in the gapminder data.

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
  geom_line()

We must not forget to group by country! 💡

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
  geom_line(aes(group = country))

But can you make sense of this mess? Luckily, we can group by continents:

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
  geom_line(aes(group = country)) +
  facet_wrap(~continent)

Better don’t facet_wrap(~country)… Let’s polish our plot with the things we already learned:

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
  geom_line(color = "grey", aes(group = country)) +
  geom_smooth() +
  facet_wrap(~continent) +
  labs(x = "Year",
       y = "Life expectancy",
       title = "Life expectancy over time on five continents")

Notice that we supplied a formula to facet_wrap. This can be more advanced, for example (with facet_grid):

ggplot(data = socviz::gss_sm, mapping = aes(x = age, y = childs)) +
  geom_point(alpha = 0.2) +
  geom_smooth() +
  facet_grid(sex ~ race) +
  labs(x = "Age",
       y = "No. of children",
       title = "Relationship between age and number of children",
       subtitle = "Separated by sex (in rows) and race (in columns)")

As a last input for this part, we learn four new geoms.

Bar plots

ggplot(data = socviz::gss_sm, mapping = aes(x = religion)) +
  geom_bar()
=======

Executing the following lines gives you access to the course material:

# install.packages("devtools")
devtools::install_github("loelschlaeger/howtoggplot2")
library(howtoggplot2)

To start the tutorial, type:

practicals()

To open a copy of these slides, type:

slides()

To submit an issue on GitHub about this course, type:

issue()

Facets and more geoms

Our goal is to plot the trajectory of life expectancy over time for each country in the gapminder data.

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
  geom_line()

We must not forget to group by country! 💡

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
  geom_line(aes(group = country))

But can you make sense of this mess? Luckily, we can group by continents:

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
  geom_line(aes(group = country)) +
  facet_wrap(~continent)

Better don’t facet_wrap(~country)… Let’s polish our plot with the things we already learned:

ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
  geom_line(color = "grey", aes(group = country)) +
  geom_smooth() +
  facet_wrap(~continent) +
  labs(x = "Year",
       y = "Life expectancy",
       title = "Life expectancy over time on five continents")

Notice that we supplied a formula to facet_wrap. This can be more advanced, for example (with facet_grid):

ggplot(data = socviz::gss_sm, mapping = aes(x = age, y = childs)) +
  geom_point(alpha = 0.2) +
  geom_smooth() +
  facet_grid(sex ~ race) +
  labs(x = "Age",
       y = "No. of children",
       title = "Relationship between age and number of children",
       subtitle = "Separated by sex (in rows) and race (in columns)")

As a last input for this part, we learn four new geoms.

Bar plots

ggplot(data = socviz::gss_sm, mapping = aes(x = religion)) +
  geom_bar()
>>>>>>> 0d887b63a19361b8869a370bf1dea2461c725816

Using relative instead of absolute counts on the y-axis is covered in the tutorials.

Histograms

<<<<<<< HEAD
ggplot(data = socviz::gss_sm, mapping = aes(x = age)) +
  geom_histogram()
=======
ggplot(data = socviz::gss_sm, mapping = aes(x = age)) +
  geom_histogram()
>>>>>>> 0d887b63a19361b8869a370bf1dea2461c725816
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 10 rows containing non-finite values (stat_bin).

We address the message and the warning in the tutorials.

Density plots

<<<<<<< HEAD
library(dplyr)
## 
## Attache Paket: 'dplyr'
## Die folgenden Objekte sind maskiert von 'package:stats':
## 
##     filter, lag
## Die folgenden Objekte sind maskiert von 'package:base':
## 
##     intersect, setdiff, setequal, union
ggplot(data = filter(gapminder, year == 2007), 
       mapping = aes(x = lifeExp)) +
  geom_density()

Boxplots

ggplot(data = filter(gapminder, year == 2007), 
       mapping = aes(x = pop,
                     y = reorder(continent, pop))) +
  geom_boxplot() +
  scale_x_log10() + 
  labs(y = NULL,
       x = "Populations in 2007")

We look at a variant on the basic boxplot that {ggplot2} offers in the tutorials.

Draw Maps

R can work with geographical data, and {ggplot2} can make choropleth maps.

world <- map_data("world")
p <- ggplot(data = world, aes(x = long, y = lat, group = group)) +
  geom_polygon(fill = "white", color = "black")
plot(p)

Instead of the default Mercator projection, we can use the Albers projection:

p + coord_map(projection = "albers", lat0 = 15, lat1 = 45)
=======
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
ggplot(data = filter(gapminder, year == 2007), 
       mapping = aes(x = lifeExp)) +
  geom_density()

Boxplots

ggplot(data = filter(gapminder, year == 2007), 
       mapping = aes(x = pop,
                     y = reorder(continent, pop))) +
  geom_boxplot() +
  scale_x_log10() + 
  labs(y = NULL,
       x = "Populations in 2007")

We look at a variant on the basic boxplot that {ggplot2} offers in the tutorials.

Annotations

We can plot text annotations to plots via geom_text():

ggplot(data = socviz::elections_historic, 
       mapping = aes(x = popular_pct,
                     y = ec_pct,
                     label = winner_label)) +
  geom_point() +
  geom_text()

This is hard to read. Adjusting the position is possible, but it is fuzzy and not robust. The extension {ggrepel} is designed to do this task for us:

ggplot(data = socviz::elections_historic, 
       mapping = aes(x = popular_pct,
                     y = ec_pct,
                     label = winner_label)) +
  geom_point() +
  ggrepel::geom_text_repel()
## Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

We can also annotate only selected points (outliers for example) like follows:

ggplot(data = socviz::elections_historic, 
       mapping = aes(x = popular_pct,
                     y = ec_pct,
                     label = winner_label)) +
  geom_point() +
  ggrepel::geom_text_repel(
    data = filter(socviz::elections_historic, popular_pct < 0.5 & ec_pct > 0.5)
  ) + 
  geom_hline(yintercept = 0.5) +
  geom_vline(xintercept = 0.5)
## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

And finally, we can annotate anywhere almost everything we wish via annotate():

ggplot(data = socviz::elections_historic, 
       mapping = aes(x = popular_pct,
                     y = ec_pct,
                     label = winner_label)) +
  geom_point() +
  geom_hline(yintercept = 0.5) +
  geom_vline(xintercept = 0.5) +
  annotate(geom = "rect", xmin = 0, xmax = 0.5, ymin = 0, ymax = 0.5, fill = "red", alpha = 0.2) +
  annotate(geom = "text", x = 0.25, y = 0.25, label = "Some text.")

Draw Maps

R can work with geographical data, and {ggplot2} can make choropleth maps.

world <- map_data("world")
p <- ggplot(data = world, aes(x = long, y = lat, group = group)) +
  geom_polygon(fill = "white", color = "black")
plot(p)

Instead of the default Mercator projection, we can use the Albers projection:

p + coord_map(projection = "albers", lat0 = 15, lat1 = 45)
>>>>>>> 0d887b63a19361b8869a370bf1dea2461c725816

Now in the tutorials, we will visualize the results of the Trump vs. Clinton election 2016 on a map of the US states.

Challenge

Reproduce this plot!

If you want to see some hints, scroll down this page.
















Don’t forget to install and load the packages {ggplot2} and {dplyr} and load the gapminder dataset.

Hint 1: Use your {dplyr}-knowledge to create an extract of the gapminder dataset that only contains values from the year 2007.




Hint 2: Have a look at the 3rd slide of this presentation to copy the basic syntax and remember how to modify the labels.




Hint 3: You can set the size and colour of the points to depend on certain variables in the aesthetics aes().




Hint 4: Have a look at ?guide to modify the legends.

Animations

{ggplot2} itself does not allow for interactive or animated visualizations.

However, there are of course R-packages to achieve this, e.g. {plotly}, {gganimate}, {shiny}

library(plotly)
ggplotly(p)
library(gganimate)
library(gifski)

p <- ggplot(gapminder, aes(x = gdpPercap, y=lifeExp, size = pop, colour = continent)) +
    geom_point(alpha = 0.7) +
    scale_x_log10(labels = scales::dollar) +
    guides(size="none") +
    guides(colour=guide_legend(title="")) +
    labs(x = "GDP per capita", y = "Life expectancy in years",
    title = "Economic growth as an indicator for life expectancy",
    caption = "Source: Gapminder")
p + transition_time(year) +
  labs(title = "Year: {frame_time}")